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Abstract 

Different numerical mappings of the DNA sequences have been 
studied using a new cluster-scaling method and the well known spec- 
tral methods. It is shown, in particular, that the nucleotide sequences 
in DNA molecules have robust cluster-scaling properties. These prop- 
erties are relevant to both types of nucleotide pair-bases interactions: 
hydrogen bonds and stacking interactions. It is shown that taking into 
account the cluster-scaling properties can help to improve heteroge- 
neous models of the DNA dynamics. It is also shown that a chaotic 
(deterministic) order, rather than a stochastic randomness, controls 
the energy minima positions of the stacking interactions in the DNA 
sequences on large scales. The chaotic order results in a large-scale 
chaotic coherence between the two complimentary DNA-duplex's se- 
quences. A competition between this broad-band chaotic coherence 
and the resonance coherence produced by genetic code has been briefly 
discussed. The Arabidopsis plant genome (which is a model plant for 
genome analysis) and two human genes: BRCA2 and NRXN1, have 
been considered as examples. 



1 Introduction 

A DNA molecule carries information in the form of four chemical groups or 
nucleotide bases: adenine, cytosine, guanine, and thymine, represented by 
the letters A, C, G and T. The order of bases on a DNA strand is the DNA 
sequence. If we read along one of the two DNA-helix sides we get text like 
GATACA... In the double-stranded DNA, the two strands run in opposite 
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Figure 1: The stacking energies for different stacked base pairs. 

directions and the bases pair up such that A always pairs with T and G 
always pairs with C. That is because these particular pairs fit exactly to form 
effective hydrogen bonds with each other. The A-T base-pair has 2 hydrogen 
bonds and the G-C base-pair has 3 hydrogen bonds. The G-C interaction is 
therefore stronger than A-T, and A-T rich regions of DNA are more prone 
to thermal fluctuations and to initiation sites (origin) at unwinding stage of 
DNA replication process. The bases are oriented perpendicular to the DNA- 
helix axis. Constant thermal fluctuations result in local twisting, stretching, 
bending, and unwinding of the double-strands. In solution DNA assumes 
linear configuration because it is the one of minimum energy. The helix axis 
of DNA in vivo is usually strongly curved because the stretched length of 
the human genome, for instance, is about 1 meter and this length needs 
to be "packaged" in order to fit in the nucleus of a cell (the diameter of 
the nucleus from a typical human somatic cell is about 5 x meters). 
Therefore, the DNA has to be highly organized. This packaging of DNA 
deforms it physically, thereby increasing its energy (less stable than relaxed 
DNA, due to less than optimal base stacking). In this situation certain 
strain is relieved by supercoiling: helix bends and twists to achieve better 
base stacking orientation despite having too many bp/turn. The difference 
in A-T and G-C interactions can be used for optimizing the free energy. The 
base-pairs stacking energies (the main stabilizing factor in the DNA duplex, 
see for instance Ref. pQ) are highly dependent on the base sequence [2J. These 
interactions come partly from the overlap of the 7r electrons of the bases and 
partly from hydrophobic interactions. Quantum chemistry calculations give 
rather different energies for different stacked base pairs: Fig. 1 (cf. Ref. 
[3]). Therefore, certain clustering of the base-pairs can be used by nature in 
order to minimize the excess energy that builds up when DNA molecules are 
deformed during the process of packaging. The physical constraints given 
by the supercoiling of the DNA sequence, in particular to the positioning of 
nucleosomes along the sequence [I], play significant role in creating of the 
clustering. 

Moreover, the increase in stored (potential) energy within the molecule 
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is then available to drive reactions such as the unwinding events that occur 
during DNA replication. Before replication of DNA can occur, the length 
of the DNA double helix about to be copied must be unwound and the 
two strands must be separated by breaking the hydrogen bonds that link 
the paired bases. The process of replication begins in the DNA molecules at 
thousands of sites called origins of replication. Because the location and time 
of initiation of origins is generally stochastic, the time to finish replication will 
also be a stochastic process. The random distribution of origin firing raises 
the random gap problem: a random distribution will lead to occasional large 
gaps that should take a long time to replicate. Despite this each cell in a 
population must complete the replication process in an accurate and timely 
manner (see for instance, Refs. Different solutions to this problem 

have been suggested (see, for instance, Refs. [Z],[8],[9],[T0]) . 

If the spacing of origins is not completely random then any regularity in 
the spacing of origins will tend to suppress the large gaps [7]. For instance, 
origins within specific clusters could be preferred to fire [H],[T2]. Since a 
G-C base pair, with three hydrogen bonds, is expected to be harder to break 
than an A-T base pair with only two bonds, a clustering of these two kinds of 
the base-pares can be operational in order to solve the random origin firing 
problem. The stacking interactions can also contribute to solution of this 
problem. It will be shown below that a chaotic (deterministic) order, rather 
than a stochastic randomness, controls the energy minima positions of the 
stacking interactions in the DNA sequences on large-scales. This chaotic 
order not only introduces a regularity into the spacing of the origins but 
also results in a long-range coherence between the two complimentary DNA- 
duplexs strands. 

2 Cluster-scaling 

Because of many orders of space scales involved in these processes one can 
expect that the clustering will exhibit scale-invariant properties (see, for in- 
stance, Ref. [IS])- A cluster-scaling for stochastic systems was recently 
suggested in Refs. [2] , [T5] , [T6] . The genome data can be readily checked 
on the cluster-scaling properties in a " 1 or 0" mapping. In this presentation 
(see, for instance Refs. [H],[23] and references therein) one should put A=l 
and C=G=T=0 in an original DNA sequences to obtain an A-dominated 
sub-sequences (one can obtain C or G, or T-dominated sub-sequences in 
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Figure 2: The standard deviation for 5n(r) vs r for T-dominated sub- 
sequence of the Arabidopsis (in log- log scales). The straight line (the best 
fit) indicates the scaling law Eq. (2). 

analogous way). Then, to study statistical clustering in sub-sequences {a.j} 
(where aj=l or and % = 1, 2...) one should take running average: 

1 i=j+r 

n A r ) = - J2 a i (!) 
i=j 

along the sub-sequences. For the 1 or mapping this running average will 
present a weight of the sub-sequences in interval [j ,j + r] . Following to Ref . 
[14] we are interested in scaling variation of the standard deviation of the 
running density fluctuations (Srij^) 2 ) 1 ^ 2 with r 

(5n 3 (T) 2 y/ 2 ~ r~ a (2) 

where (...) denotes average over the sub-sequences, drijij) = n 3 -(r) — (n(r)). 
The power law, Eq. (2), corresponds to a scale- invariant (scaling) behavior. 

The exponent a in Eq. (2) was called in Ref. [H] as cluster-exponent. For 
white noise zeros (intersections of a white noise signal with time axis) it can 
be derived analytically that a — 1/2 (see Ref. [H] and references therein). 
This value can be considered as an upper limit (non-clustering case) for the 
cluster-exponent. If < a < 0.5 we have a cluster-scaling situation, and the 
cluster-scaling is stronger for smaller values of a (see for examples Ref. |14j). 
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Figure 3: The standard deviation for 5n(r) vs r for T-dominated sub- 
sequence of gene BRCA2 (in log-log scales). The straight line (the best 
fit) indicates the scaling law Eq. (2). 

In this paper we will present, as an example, results obtained for the 
genome sequence of the flowering plant Arabidopsis thaliana, which is a 
model plant for genome analysis [TT] and for two human genes BRCA2 and 
NRXN1. 

Let us start from the Arabidopsis. Its genome is one of the small- 
est plant genomes (about 157 million base pairs and five chromosomes) 
that makes Arabidopsis thaliana useful for genetic mapping and sequencing. 
The most up-to-date version of the Arabidopsis thaliana genome is main- 
tained by The Arabidopsis Information Resource (TAIR) (see, for instance, 
http : / / www.plantgdb.org/] ). The results of computations for the genome se- 
quences associated with the Arabidopsis are shown in figure 2. We show in 
Fig. 2 results for the T-dominated sub-sequence, whereas the results for A, 
C, and G-dominated sub-sequences are similar to those shown in the Fig. 2. 
The Fig. 2 shows (in the log-log scales) dependence of the standard devia- 
tion of the running density fluctuations (Sn^r) 2 ) 1 ^ 2 on r for the T-dominated 
subsequence. The straight line is drawn in this figure to indicate the scaling 
(2). The slope of this straight line provides us with the cluster-exponent 
a = 0.33 ± 0.02. The results of computations for the genome sequences 
associated with genes: BRCA2 and NRXN1, are shown in figures 3 and 
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Figure 4: The standard deviation for 5n(r) vs r for T-dominated sub- 
sequence of gene NRXN1 (in log-log scales). The straight line (the best 
fit) indicates the scaling law Eq. (2). 



4 respectively (the full set of the genome sequences can be found in site: 
|http: / / www.ncbi.nlm.nih.gov[ ) . Molecular location of gene BRCA2 on chro- 
mosome 13: base pairs 32,889,616 to 32,973,808). BRCA2 gene helps prevent 
cells from growing and dividing too rapidly or in an uncontrolled way. By 
helping repair DNA, BRCA2 plays a role in maintaining the stability of a 
cell's genetic information. Gene NRXN1 (neurexin 1) is among the largest 
known in human, molecular location on chromosome 2: base pairs 50,145,642 
to 51,259,673. NRXN1 gene represents a strong candidate for involvement 
in the etiology of nicotine dependence, and even subtle changes in NRXN1 
might contribute to susceptibility to autism. 

We show in Figs 3 and 4 results for the T-dominated sub-sequences, whereas 
the results for A, C, and G-dominated sub-sequences are similar to those 
shown in the Figs. 3 and 4 (for each gene respectively). Fig. 3 shows (in 
the log-log scales) dependence of the standard deviation of the running den- 
sity fluctuations (Sn^r) 2 ) 1 ^ 2 on r for the T-dominated subsequence of gene 
BRCA2. The straight line is drawn in this figure to indicate the scaling 
(2). The slope of this straight line provides us with the cluster-exponent 
a = 0.30 ± 0.02. Figure 4 shows analogous result for gene NRXN1 with 
a = 0.35±0.02. One can see that in both cases we have rather strong cluster- 



6 



1 r 



C\J 



V 



A 



0.01 - 



0.1 - 




A&T 



NRXN1 



0.001 



10 



100 



1000 



10000 



100000 



Figure 5: The standard deviation for 8tl{t) vs r for A& T (circles) dominated 
sub-sequence of gene NRXN1 (in log-log scales). The straight lines (the best 
fit) indicate the scaling law Eq. (2). 

scaling with the cluster-exponent different for the different genes. The main 
consequence of the finite-size effects for the cluster-scaling is a wavy character 
of the scaling data in the log- log scales (cf. Ref. [E]). 

3 Hydrogen bonds 

The most popular potential for modeling the hydrogen (H) bond within a 
base-pair in the DNA chains is the Morse potential (see, for instance, Refs. 



where Di is the site-dependent dissociation energy of the ith pair, which can 
take two values Di = D A ~ T and Di = D C ~ G for the A-T and the C-G pairs 
in the ith site respectively (the A-T pair includes two H bonds, while the 
C-G pair includes three H bonds, see Introduction); a -1 is a measure of the 
potential well width; variable t/i is a dynamical deviation of the H bonds from 
their equilibrium lengths at position i. The ratio D°~ G / D A ~ T = 1.5 is often 



.[25],[26],[27],[28J): 




(3) 
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Figure 6: The standard deviation for 5n(r) vs r for AT-TA (circles) and 
CG-GC (filled circles) dominated sub-sequence of gene BRCA2 (in log-log 
scales). The straight lines (the best fit) indicate the scaling law Eq. (2). 

used for the model purposes (though recent quantum chemical calculations 
[29] results in a ratio D C - G /D A ~ T = 2). 

Randomly distributed along the DNA chain bivalued H-bond coupling 
strengths D A ~ T and D G ~ C are usually used in the DNA dynamics models. 
This would be appropriate for an arbitrary base pair sequence. However, as 
it is follows from previous consideration, homogeneous random distribution 
is not realistic even for the most long genes like NRXN1 (see, for instance, 
Ref. [30]). The dynamic heterogeneous properties of DNA molecule was 
considered, for instance, as a reason for the so-called multi-step melting [31]. 
However, the assumption of a random and short-range (delta-) correlated 
sequence made in Ref. [31] do not result in the multi-step melting and 
only an additional assumption of an additional backbone stiffness due to 
the double-stranded conformation of DNA molecule allowed to the authors 
to observe a multi-step melting in their model. In the model suggested In 
Ref. [32] the sequence randomness considered as a quenched noise with finite 
sequence correlation length. In this approach regions dominated by A-T or, 
alternatively, by C-G pairs play significant role in the bubble (i.e. locally 
denaturated states) formation. 

Taking into account the cluster-scaling of the DNA nucleotides is a natural 
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Figure 7: The standard deviation for 5n(r) vs r for AT-TA (circles) and 
CG-GC (filled circles) dominated sub-sequence of gene NRXN1 (in log-log 
scales). The straight lines (the best fit) indicate the scaling law Eq. (2). 

step toward more realistic dynamical model. Because of the bivalued H-bond 
coupling strengths: Di = D A ~ T or Di = D G ~ C , this can be readily done 
using following bivalued mapping: A = T = 1, C = G = or, alternatively, 
C = G = 1, A = T = 0. Figure 5 shows cluster-scaling behavior, Eq. (2), 
for the former mapping of NRXN1 gene. The cluster scaling exponent a = 
0.32 ±0.02 in this case. For C = G = 1, A = T = the mapping calculations 
give the same result as well as for corresponding mappings of the gene BRCA2 
(indication of an universality). Therefore, the bivalued sequences of the Di 
coefficients for the DNA dynamic chain should be chosen as cluster-scaling 
ones with certain cluster-exponent a Eq. (2) (for the considered genes a ~ 
0.32). 

4 Stacking interaction 

Two factors are mainly responsible for the stability of the DNA double helix: 
base pairing between complementary strands and stacking between adjacent 
bases (see Introduction). It is shown experimentally that DNA stability 
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is mainly determined by base-stacking interactions which contribute greatly 
into the dependence of the duplex stability on its sequence, (see, for instance, 
Ref. [TJ). Therefore, it is interesting to check whether the stacking interac- 
tions dominate also the above-considered cluster-scaling phenomenon (cf. 
Introduction). In order to check this let us use following mapping: in combi- 
nation AT = TA = 11, and A = T = G = C = otherwise. An alternative 
mapping is: in combination CG = GC = 11, and A = T = G = C = oth- 
erwise. If the stacking interactions dominate the cluster-scaling phenomenon, 
then one can expect that the cluster-scaling will be more pronounced just 
for these maps (cf. Fig. 1). It means that the cluster-exponent correspond- 
ing to these maps would be smaller than cluster-exponents observed for the 
above considered maps. As one can see comparing Figs. 6 and 7 with Figs. 
3,4, and 5 in reality we have an opposite situation. This comparison indi- 
cates that the stacking interactions do not dominate the above- considered 
cluster-scaling phenomenon (at least for the examples given in the paper). 

In a realistic dynamic model of DNA molecule one should take into ac- 
count also the cluster-scaling of stacking interaction itself as it is shown in 
Figs. 6 and 7 for instance (see also Refs. [33] > [31] for heterogeneity of both 
pairing and stacking interactions). This can be done in the frames of a com- 
monly used approximation for the stacking potential (see, for instance, Ref. 

Wiiyuyi-!) = (l - exp (-% - y^f)) (4) 

where Aifj can take different values for different stacked pairs {UiiUi-i}- Be- 
cause the situation is not bivalued in this case this task seems to be more 
difficult than for the hydrogen bonds. The main problem here is hybridiza- 
tion of the nucleotides in different types of the stacked base-pairs (Fig. 1). 
The fact that the cluster-scaling exponents for the different types of stacked 
base-pairs have approximately the same value can help to solve this problem. 
This is not the case, however, for hybridization problem if one will consider 
a realistic model taking into account cluster-scaling of both hydrogen bonds 
and stacking interactions (the cluster-scaling exponents are different for hy- 
drogen bonds: Fig. 5, and for stacking interactions: Figs. 6 and 7). 
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Figure 8: Mapping of a spiky sequence (figure 8b) into a telegraph one (figure 
8a). 

5 Chaotic order of the stacking interactions 

Both stochastic and deterministic processes can result in the broad-band part 
of the spectrum, but the decay in the spectral power is different for the two 
cases. An exponential decay with respect to frequency refers to chaotic time 
series while a power-law decay indicates that the spectrum is stochastic. Not 
all chaotic systems have the exponentially decaying spectra, but appearance 
of an exponentially decaying spectrum in the system under consideration 
provides a strong indication that we have deal with a chaotic (determinis- 
tic) process [36]-[39]. The previously observed spectra of the DNA-sequences 
mappings exhibited power-law decay indicating a stochastic origin of the 
DNA-sequences randomness [H]-[2l]. It should be noted that all the maps 
used in these investigations operated with simple use of the A or/and T 
or/and G or/and C numerical mapping. It seems, however, that a deeper 
insight in the underlying physics can be obtained using numerical maps oper- 
ating with the combinations AT and TA which represent energy minima for 
the stacking interactions (cf. Fig. 1 and Ref. [15J). For this purpose we will 
put combinations AT = TA = 1, and A = T = G = C = otherwise in a 
DNA-sequence under consideration (the multiple AT/TA combinations will 
be also considered as a single '1' in this mapping, for example: ATATTA=1). 
This map will represent a {0,l}-values map of the stacking interactions en- 
ergy minima sequence. 

An additional technical problem will appear when one will try to analyze 
spectral properties of such map. The sequence will be very spiky and the 
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Figure 9: Spectrum of a telegraph series corresponding to the energy minima 
map constructed for the genome sequence of the Arabidopsis thaliana. 

usual spectral methods (such as the fast Fourier transform or the maximum 
entropy method, for instance) will be practically useless. In order to solve this 
problem we will use an additional mapping of the spiky series into a telegraph 
signal. The spikes (symbols 1) are identical to each other and the dynamical 
information is coded in the length of the interspike intervals and the interspike 
intervals positions on the sequence, therefore it is the most direct way to map 
the spiky sequence into a telegraph signal, which has values -1 from one side 
of a spike and values +1 from another side of the spike. An example of such 
mapping is given in figure 8. While the dynamical information is here the 
same as for the corresponding spike sequence, the spectral methods are quite 
applicable to analysis of the telegraph series. 

Figure 9 shows spectrum of a telegraph sequence corresponding to the 
above-described map constructed for the genome sequence of the Arabidopsis 
thaliana. We used the semi-log axes in the Fig. 9 in order to indicate 
exponential decay of the spectrum (the straight line). It should be noted 
that value of fo is not the same for different species (for human genome, for 
instance, fo — 0.05). 

Many of the well known chaotic attractors ('Lorenz', 'Rossler', etc.) ex- 
hibit the exponentially decaying spectra [37]. Here, for comparison with the 
Fig. 9, we will consider a chaotic spectrum generated by the Kaplan- Yorke 
map [ID] (relevance of this choice will be clear immediately) . In the Langevin 
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Figure 10: Spectrum of a chaotic solution of the Kaplan- Yorke map. 



approach to Brownian motion the equation of motion is 

y = -<yy + F(t) (5) 

where the fluctuating kick force on the particle is a Gaussian white noise: 
F(t) = Y^nVnfi(t — nr) and y(t) to take values in R m . One can assume 
[JT] that the evolution of the kick strengths is determined by a discrete time 
dynamical system T on the phase space and projected onto R m by a function 
/: 

Vn = f{Xn-l), Xn+1 = Tx n (6) 

Then the solution of Eq. (5) is 

y (t) = e-^- n ^y n (7) 

where n equals the integer part of the relation t/r and the recurrence 

x n+1 = Tx n , y n+1 = ay n + f(x n ) (8) 

provides value of y n (with a = e _7T ). In certain sense the dynamical system 
(8) is equivalent to the stochastic differential equation (5). In the generaliza- 
tion related to the Eq. (8) the force F(t) can be considered as a non-Gaussian 
process which is determined by / and T. The Kaplan- Yorke map [40j,[42j,[43j 
is a particular simple case for this generalization: 

Tx = 2x (mod 1), f(x) = cos47ra; (9) 
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Figure 11: Autocorrelation function corresponding to the spectrum shown in 
Fig. 9. 

Figure 10 shows spectrum of a chaotic solution of the Kaplan- Yorke map 
(a = 0.2). We used the semi-log axes in this figure in order to indicate 
exponential decay of the spectrum (the straight line). 

Although the exponential part of the spectrum in Fig. 9 is apparently 
extended to the frequencies / ~ 0.3 for frequencies larger then / ~ 0.2 (i.e. 
for scales n < 5) corresponding telegraph signal is a random one. This can be 
seen from the figure 11, which shows autocorrelation function corresponding 
to the spectrum shown in Fig. 9 (the correlation length ( = 2). We have a 
chaotic order on the large scales only (see also next section, Fig. 13). Fig. 
12 shows analogous autocorrelation function for the Kaplan- Yorke solution. 

6 Large-scale chaotic coherence 

In the double-stranded DNA the two strands are complementary in a local 
sense, i.e. the nucleotide bases pair up such that A always pairs with T 
and G always pairs with C. But what can one say about nonlocal coher- 
ence of the nucleotides' sequences in the two strands? Actually, because of 
the local complimentary behavior this question can be answered by study- 
ing coherence between the A (map: A=l, T=C=G=0) and T (map: T=l, 
A=C=G=0) dominated sequences along a single strand (analogously for the 
C and G sequences). Due to the complementary properties of the A and 
T nucleotides the chaotic (deterministic) order of the energy minima of the 
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stacking interactions should result in a large-scale coherence of the two DNA 
strands' sequences. This is mainly relevant to A and T sequences because 
just the AT/TA compositions correspond to the energy minima. Certain 
(however considerably smaller) large-scale coherence can appear also in C 
and G sequences as a secondary effect (see Fig. 13 and the explanations 
below). In order to compare coherent properties of the two DNA strands' 
sequences we will use cross-spectral analysis. The cross spectrum Ei 2 (f) of 
two processes Xi(t) and x 2 (t) is defined by the Fourier transformation of the 
cross-correlation function normalized by the product of square root of the 
univariate power spectra Ei(f) and E 2 (f): 

E / ,x _ ErfoiftW* - r)) exp(-i2nfT) 
2nJE 1 (f)E 2 (f) 



the bracket (...) denotes the expectation value. The cross spectrum can be 
decomposed into the phase spectrum (f>i ;2 (f) and the coherency Ci j2 (/): 

£i, 2 (/) = C^U)e- 1 ^ (7) 

Because of the normalization of the cross spectrum the coherency is ranging 
from Ci t2 (f) = 0, i.e. no linear relationship between xi(t) and x 2 (t) at /, to 
Ci,2(/) = 1, i-e. perfect linear relationship. 

Figure 13 shows coherency of the A (or T) dominated sequences on the 
two strands of the Arabidopsis DNA-duplex (solid curve), and coherency of 
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Figure 13: Coherency of the two DNA strands' sequences for the Arabidopsis 
thaliana. 

the C (or G) dominated sequences (dashed curve). While the A (or T) dom- 
inated sequences exhibit rather high (> 0.5) coherency in a low- frequency 
domain / < 0.15 (i.e. for the length periods > 7 nucleotides, cf. last para- 
graph of the previous Section), the C (or G) dominated sequences exhibit a 
low coherence even in this domain. The last (low) coherence is a secondary 
effect to the the former one (see above). 

It should be also noted that the C/G coherency has a strong burst (the 
peak marked by an arrow in the Fig. 13) in a narrow vicinity of frequency 
/ ~ 0.33. This resonance peak comes from a very low coherency background 
and corresponds to the codons 7 period T=3 nucleotides (T=l/f). Let us recall 
that a codon is a sequence of three adjacent nucleotides constituting the ge- 
netic code (a specific amino acid residue in a polypeptide chain). Therefore, 
one can speculate that the two complimentary DNA strands can have rela- 
tively strong coherence related to the genetic code content in the case of the 
C/G containing codons, whereas the large-scale chaotic coherence related to 
the large-scale chaotic order in the A/T containing codons can suppress the 
genetic coherence (cf. Fig. 13). However, this speculation leads us beyond 
the pure physical frames of present paper. 
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Figure 14: The standard deviation for 8n{r) vs. r for the energy minima 
{0,l}-map used in section V (in log-log scales). The straight lines (the best 
fit) indicate the scaling law Eq. (2). 

7 Discussion 



For the Gaussian-like processes there is a very close relationship between 
their spectral and cluster-scaling properties |15J . If the system under con- 
sideration is non-Gaussian, then this relationship is broken. The section V 
provides a good example of such situation. Indeed, figure 14 shows the stan- 
dard deviation for 5n(r) vs. r for the energy minima {0,l}-mapping used in 
the section V. The straight lines (the best fit) indicate the scaling law Eq. 
(2). Thus, for the considered non-Gaussian system a robust cluster-scaling 
(Fig. 14) can co-exist with the non-scaling spectrum (Fig. 9). Therefore, the 
long-range correlations (which correspond to the power-law scaling spectra) 
in the human genome [18j-[23j are not directly related to its cluster-scaling 
properties. These two types of scaling behavior are independent for the 
non-Gaussian systems that makes the cluster-scaling method an independent 
tool for studying these systems. Moreover, while the previously used for the 
spectral computations simple maps indicate stochastic behavior, the intrin- 
sic map for the energy minima of stacking interactions indicates a chaotic 
(deterministic) interactions in an underground of the stochastic system on 
large scales. This large-scale chaotic order can be operational in the resolv- 
ing the random gap problem mentioned in the Introduction and should affect 
the DNA-duplex dynamics creating, in particular, the large-scale coherence 
between the two strands of the DNA-duplex. 
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